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ABSTRACT 

The Greater Plains Collaborative (GPC) is composed of 
10 leading medical centers repurposing the research 
programs and informatics infrastructures developed 
through Clinical and Translational Science Award 
initiatives. Partners are the University of Kansas Medical 
Center, Children's Mercy Hospital, University of Iowa 
Healthcare, the University of Wisconsin-Madison, the 
Medical College of Wisconsin and Marshfield Clinic, the 
University of Minnesota Academic Health Center, the 
University of Nebraska Medical Center, the University of 
Texas Health Sciences Center at San Antonio, and the 
University of Texas Southwestern Medical Center. The 
GPC network brings together a diverse population of 
10 million people across 1300 miles covering seven 
states with a combined area of 679 1 59 square miles. 
Using input from community members, breast cancer 
was selected as a focus for cohort building activities. 
In addition to a high-prevalence disorder, we also 
selected a rare disease, amyotrophic lateral sclerosis. 



The Greater Plains Collaborative (GPC) brings 
together over 10 million covered lives and encom- 
passes over 20 hospitals, 700 clinic locations, and 
8000 providers responsible for tertiary and quater- 
nary care in most regions, primary care for specific 
populations, and comprehensive management and 
follow-up for patients with rare diseases (such as 
amyotrophic lateral sclerosis (ALS)) and for those 
with our selected common disease, breast cancer. Of 
these, over 6 million have data, such as laboratory 
results, medications, vital signs and diagnoses, main- 
tained in electronic health records (EHRs). This 
population covers the spectrum from primary care 
networks serving rural and small communities to 
urban populations with significant African-American 
and Hispanic representation. The centers at 
Wisconsin, Kansas, Nebraska, and Minnesota also 
have active liaisons with their respective state's Native 
American populations. With one current exception, 
all of our sites include comprehensive pediatric as 
well as adult care (Children's Mercy Hospital). 

Figure 1 illustrates the GPC's data sources, tech- 
nical components, and governance. Each site in the 
GPC has existing processes and governance 
between their research and healthcare system orga- 
nizations to support clinical trials and the reuse of 
health record data for research. Existing resources 
are shown in black, new site data sources that can 
supplement longitudinal data capture in green, new 
components to be deployed at the sites in red, and 
GPC-level data stores and components in blue. 

Site-level governance is an essential part of the 
GPC and involves: 



► Institutional review boards (IRBs), which oversee 
identified data requests and prospective trials 

► Data request oversight committees, which 
incorporate healthcare system and university 
oversight of data requests and approve fully dei- 
dentified data requests that are classified as 
being outside the scope of human subject 
research by the IRB. After data requests are 
approved, a neutral member of the informatics 
team, the 'honest broker', extracts data from the 
data repository for the researcher 

► University- and hospital-based biospecimen 
resource request processes governing the release 
of samples 

► Healthcare system EHR steering committees, 
which oversee the configuration and standard- 
ization of clinical systems 

► Clinical and translational science committees, 
which govern the use of registries for prospect- 
ive trial recruitment 

There are three potential additional areas (green 
in figure 1) for incorporating data: health informa- 
tion exchanges (HIEs), Medicare claims data on 
care received outside our health systems and avail- 
able to accountable care organizations, and state 
Medicaid claims. 

To create a highly productive and responsive 
network, the GPC will integrate the following com- 
ponents (red in figure 1) at each site with our exist- 
ing EHRs, i2b2 (Informatics for Integrating Biology 
and the Bedside) data repositories, data-capture 
systems, and personal health records used for 
patient registries and engagement: 

1. Data standardization: the concept paths used by 
i2b2 to describe observations and findings will 
be harvested along with usage statistics to share 
at the GPC level 

2. Deidentified dataset extraction: for cohort char- 
acterization, we have developed a lightweight 
i2b2 plug-in to be used by each site's honest 
broker to extract cohort datasets and securely 
transfer them to the GPC data store for analysis 

3. Patient-reported outcome measures (PROMs): 
standardized measures will be deployed using 
either EHR patient portals or data collection 
instruments for existing registry and research 
management systems such as REDCap 
(Research Electronic Data Capture) 

4. Comparative effectiveness research (CER) trial 
components: we will configure CER trial com- 
ponents directly in the EHR (preferred) or inte- 
grate existing data capture and trial management 
systems such as REDCap, Velos, and OnCore 
because of either limited EHR build team cap- 
acity or flexibility to efficiently iterate prototypes 
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GPC Governance with National Intersection ^ 
-GPC Alignment of Local Site Governance (DROC/IRB) with GPC and Reciprocal Deferral IRBi^^^^^^^^ 




Each site through their clinical and translational science institutes and healths/stems has 
existing approaches to governance and oversight. This involves: 

A) IRB protocol and oversight of the honest broker protocol governing the data repository 

B) IRB oversight for identified data requests 

C) IRB oversight for prospective trials 

D) HealthSystem oversight for data requests (de-id and identified) 

E) Clincal Research Oversight for contacting prospective trial patients 

F) Healthsystem CMIO/CIO oversight of EMR modifications and components 



Legend: 

Black items are current site processes/systems. 

items are data sources which might be 
piloted at each site, but not deployed across the 
network. 

Red items are new components deployed at 
each site across the network 
Blue items are components deployed centrally 
Purple lines show the feedback processes to 
configure sites for PROM. CER. and 
coordinating amongst biospecimen repositories 



Figure 1 Greater Plains Collaborative network components. Existing resources are shown in black, new site data sources that can supplement 
longitudinal data capture in green, new components to be deployed at the sites in red, and GPC-level data stores and components in blue. 



5. Limited dataset extraction: methods similar to 2 above but 
CER trials will require precise dates and times to monitor 
accrual and performance. 
It is important for a new network to start off by building 
trust and therefore we will initially limit the data handled at the 
network level. We will focus attention on establishing 
governance and interoperability and allow trust among our 
sites to develop. The first steps involve establishing an overall 
master data-sharing agreement. It will be based on examples of 
existing University of Kansas Medical Center (KUMC) 
data-sharing agreements. We also plan to develop an 
IRB-reciprocal deferral model for the network. We also will 
deploy the following activities/components at the GPC level 
(blue in figure 1): 

1. An i2b2 x ontology database: to store the terminologies used 
at each site, but not patient data. We will harvest the terms 
used at the sites and statistics of the number of facts and 
patients observed for each term. This will allow us to 
measure overall network alignment with national standards, 



and to map and monitor processes to increase data harmoni- 
zation in an iterative manner. 

2. Data request oversight tools: based on KUMC 
REDCap 2 -based tools for use by the GPC-level oversight of 
data requests, biospecimen requests, and tracking CER trial 
approvals. These tools allow an authenticated faculty to 
sponsor student/staff access and submit data use requests via 
REDCap surveys, which are then reviewed by hospital, 
clinic, and university oversight officials, who use 
organization-specific case report forms to approve access and 
data requests. A final case report form is used by the honest 
broker to track data fulfillment, the i2b2 queries used to 
define the cohort and data elements, and the patients 
included in the released datasets. 

3. A REDCap data store and RStudioServer 3 analysis suite: to 
maintain aggregate deidentified datasets for cohort 
characterization. 

4. A development environment: to configure generalized 
patient-reported outcome modules within EHRs and 
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research registry tools at the sites. For prototyping measures 
in REDCap, the common instance for deidentified data will 
be used. For modules deployed using the EHRs, a develop- 
ment environment will be configured with a site's EHR. 

5. A development environment: to configure generalized CER 
modules to be deployed in the EHRs at the sites. Where 
applicable, this will use the common REDCap and 
RStudioServer environments but augmented by GPC devel- 
opment environments for services (web services/interface 
engines) and stubs for application programming interfaces to 
site EHRs. 

6. A REDCap data store and RStudioServer analysis suite: to 
maintain aggregate limited datasets for monitoring prospect- 
ive CER trials. We will work with the Patient-Centered 
Outcomes Research Institute (PCORI) and the national 
coordinating center to align these tools with national objec- 
tives and trial design considerations. 

Since 2011, the KUMC medical informatics team has distribu- 
ted the open-source HERON 4 framework for migrating transac- 
tional data into a vendor-neutral i2b2 data warehouse. This is 
valuable to collaborators using the Epic EHR, as well as organi- 
zations that use standard datasets (eg, North American 
Association of Central Cancer Registries (NAACCR) tumor 
registries, the Social Security Administration's Data Master File 
for mortality, the University HealthSystem Consortium Clinical 
Data Base). The team has devised methods 5 for incorporating 
locally developed REDCap research registries within the i2b2 
environment and for integrating preliminary statistical analysis 
into the i2b2 framework using the open-source R language. 6 
This integration will be used for data exchange between GPC 
sites and GPC data stores. 

The GPC sees PCORI's Clinical Data Research Networks 
(CDRNs) as a test of the nation's multi-billion dollar investment 
in EHRs. While the Office of the National Coordinator requires 
attestation that a provider organization has implemented a certi- 
fied EHR, there has been little quantitative measurement regard- 
ing the degree to which the data contained in EHRs are capable 
of being used to measure clinical effectiveness. While we laud 
the efforts to create HIEs, there is concern that such exchanges 
may devolve to the lowest common denominator of interoper- 
ability and lack the rich detail and structure required to support 
research. In contrast, the GPC sites obtain the underlying 
detailed observations recorded directly in EHR and billing 
systems, standardized registries, biorepository databases, and 
supplemental electronic data-capture methods. CDRNs will 
provide a true test of our emerging national learning healthcare 
system by developing targeted trials for specific clinical popula- 
tions and outcomes. The GPC network standards will adhere to 
national and international data standards specified by the 
nationwide health information network (NwHIN) 7 and subse- 
quent guidance provided by the Office of the National 
Coordinator for Health Information Technology and outlined 
by meaningful use stage 2 (MU2 8 ) and stage 3 criteria. 

We use i2b2 as a common data model to consolidate data 
from (a) EHRs, (b) administrative 'billing' data and derived 
benchmarking datasets such as the University HealthSystem 
Consortium Clinical Data Base (UHC CDB), 9 (c) research regis- 
tries (eg, Tumor Registries) and (d) PROMs (prototyped in 
REDCap). We bind both the internal EHR concept codes and 
mapped code sets to standard terminologies into i2b2 so we can 
quantitatively measure MU2 attainment based on both concept 
coverage and the amount of observed data. 

Figure 2 provides on an example of an amoxicillin chewable 
tablet concept which has an internal code (452). This code has 



mappings to a First Databank code (gcnseqno 9001) for allergy 
checking, as well as one to many relationships to various 
National Drug Codes (NDCs) stocked by the pharmacies (NDC 
54868-3105-0 manufactured by Physicians Total Care; NDC 
0093-2268-01 manufactured by Teva USA). An EHR may have 
5% of its medication formulary aligned with interoperable stand- 
ard medication terminology. As an interim technique, these data 
can still be integrated for multisite queries by using the flexibility 
of the i2b2 data model to map local terminology codes to inter- 
operable standards. Using existing concept mapping techniques, 
as well as select manual mappings illustrated in figure 3, the i2b2 
common data are expected to achieve 95% alignment. In our 
example, the medication concept is mapped to an RxNorm 
Semantic Clinical Drug Form (RXCUI 370577), 10 facilitating 
query across different dispense sizes. 

We will share our findings and measurement framework with 
the chief medical information officers who must configure the 
EHRs to comply with MU2. The timing of federal incentive 
payments will catalyze this activity. Once the MU2 work is com- 
plete, we might see 94% alignment natively within the EHR, 
enabling deployment of standardized CER trial components and 
PROMs within the clinical workflow. By incorporating existing 
standard research registries, such as the NAACCR tumor regis- 
try, we can also directly evaluate MU2-compliant EHRs' and 
billing systems' abilities to represent existing research informa- 
tion models. Breast cancer provides an ideal exemplar. 

This work is possible because of the increasing support of the 
NwHIN domain model by EHR vendors. The GPC network 
will allow us to design around Epic EHR considerations and 
then to generalize our approach to two other EHR systems 
(Cerner at Children's Mercy and Cattails MD at the Marshfield 
Clinic). For diagnoses, a vendor, Intelligent Medical Objects, 
provides mapping for SNOMED CT 11 and International 
Classification of Diseases — Clinical Modification (ICD-*-CM) 12 
coding of diagnoses and history. However, mapping of family 
history, allergy records, findings, and procedures to SNOMED 
CT will be required. For medication orders and prescriptions, 
Epic partners First Data Bank on mappings to RxNorm. 
RxNorm is in varying stages of deployment among our sites. All 
sites are responsible for mapping of immunizations to Centers 
for Disease Control and Prevention (CDC) Vaccines 
Administered (CVX) codes. 13 14 For laboratory results, 
LOINC 15 is installed within Epic at each site's discretion and 
requires mapping at the site of laboratory master tables to 
LOINC for query access to coded LOINC results as well as 
deployment in any enterprise laboratory information systems. 
For procedures, Epic provides and supports coding to 
ICD-*-CM, ICD-*-PCS (Procedure Coding System), CPT 
(Current Procedural Terminology) and Healthcare Common 
Procedure Coding System (HCPCS) as part of the model 
system, with sites responsible for installing yearly updates. 

Benchmarking activities derived from administrative/billing 
data sources and national registries provide additional sources of 
data for the GPC. Specifically, the UHC CDB provides rich, stan- 
dardized diagnoses, encounter details, and outcomes derived 
from billing, while the NAACCR — used by tumor registrars — 
provides established mechanisms for characterizing the breast 
cancer population predominantly codified by the International 
Classification of Diseases for Oncology. 16 Targeting these 
standardized datasets allows us to create an ETL (extract, trans- 
form, load) process, which benefits the national network, is 
vendor agnostic, and will enable direct comparison of 
EHR-derived network capabilities with administrative data and 
abstracted research registries traditionally used in health service 
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Figure 2 Greater Plains Collaborative (GPC) interoperable standardization measurement framework. 



research. 17-20 This also complements efforts at sites to incorpor- 
ate billing information with that from financial systems. Further, 
we will develop ETL processes for incorporating Medicare and 
Medicaid claims data into i2b2 based on standard data formats. 

B |q] Glucose (Group:GLU) [1 87401 5 facts; 1 00774 patients) 
$ !E3 Potassium (Group:K) [780042 facts; 96673 patients] 

K£i POTASSIUM (COMPONENTJD:2002) [780042 facts; 

1 £] POTASSIUM (LOINC:2823-3) [780042 facts; 96673 p 
&■© Sodium (Group:NA) [771 1 47 facts: 96590 patientsl 

Amoxicillin / Clavulanate Oral Tablet [85.947 fact 

$ jjS Amoxicillin Chewable Tablet [6.648 facts: 2.764 \ 
kfil AMOXICILLIN 1 25 MG PO CHEW [45 facts: 1 ! 
hjB AMOXICILLIN 200 MG PO CHEW [20 facts] 
j-jfi AMOXICILLIN 250 MG PO CHEW [556 facts; 

Figure 3 Flexible terminology mappings in Informatics for Integrating 
Biology and the Bedside (i2b2). 



EHRs' functionality or the capacity of healthcare system 
information technology teams to collect PROMs via patient 
portals or to integrate with patients' personal health records is 
often a lower priority than documentation during the healthcare 
encounter. Rare diseases (such as ALS) and even common condi- 
tions (such as breast cancer) struggle to have data elements (eg, 
performance status for cancer) captured discretely in the EHR. 
We will use REDCap, deployed at all GPC sites, as a simple user 
interface to prototype codification of data-capture instruments 
such as PROMs and the National Institute for Neurological 
Disorders ALS common data elements. 

We do not see a requirement for real-time interfaces between 
sites and the GPC centrally to fulfill the initial objectives. 
Instead, site honest brokers will use an open-source i2b2 
plug-in, RDataBuilder, to extract data into a standardized R data 
frame. We believe this honest-broker-mediated approach will 
suffice for initial development of CER trial monitoring, but we 
remain open to developing more scalable approaches grounded 
in practical experience from running trials as part of the 
national network. 
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